The freetext matching algorithm: a computer program to extract diagnoses and causes of death from unstructured text in electronic health records

نویسندگان

  • Anoop Shah
  • Carlos Martínez
  • Harry Hemingway
چکیده

BACKGROUND Electronic health records are invaluable for medical research, but much information is stored as free text rather than in a coded form. For example, in the UK General Practice Research Database (GPRD), causes of death and test results are sometimes recorded only in free text. Free text can be difficult to use for research if it requires time-consuming manual review. Our aim was to develop an automated method for extracting coded information from free text in electronic patient records. METHODS We reviewed the electronic patient records in GPRD of a random sample of 3310 patients who died in 2001, to identify the cause of death. We developed a computer program called the Freetext Matching Algorithm (FMA) to map diagnoses in text to the Read Clinical Terminology. The program uses lookup tables of synonyms and phrase patterns to identify diagnoses, dates and selected test results. We tested it on two random samples of free text from GPRD (1000 texts associated with death in 2001, and 1000 general texts from cases and controls in a coronary artery disease study), comparing the output to the U.S. National Library of Medicine's MetaMap program and the gold standard of manual review. RESULTS Among 3310 patients registered in the GPRD who died in 2001, the cause of death was recorded in coded form in 38.1% of patients, and in the free text alone in 19.4%. On the 1000 texts associated with death, FMA coded 683 of the 735 positive diagnoses, with precision (positive predictive value) 98.4% (95% confidence interval (CI) 97.2, 99.2) and recall (sensitivity) 92.9% (95% CI 90.8, 94.7). On the general sample, FMA detected 346 of the 447 positive diagnoses, with precision 91.5% (95% CI 88.3, 94.1) and recall 77.4% (95% CI 73.2, 81.2), which was similar to MetaMap. CONCLUSIONS We have developed an algorithm to extract coded information from free text in GP records with good precision. It may facilitate research using free text in electronic patient records, particularly for extracting the cause of death.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extracting Diagnoses and Investigation Results from Unstructured Text in Electronic Health Records by Semi-Supervised Machine Learning

BACKGROUND Electronic health records are invaluable for medical research, but much of the information is recorded as unstructured free text which is time-consuming to review manually. AIM To develop an algorithm to identify relevant free texts automatically based on labelled examples. METHODS We developed a novel machine learning algorithm, the 'Semi-supervised Set Covering Machine' (S3CM),...

متن کامل

میزان توافق تشخیص های موجود در پرونده بیمارستانی و علت مرگ ثبت‌شده در گواهی فوت در بیمارستان لقمان حکیم طی سال 1384

Background & Objectives: There is little doubt about the importance of accurate statistics and reliable information in the promoting community health and optimizing health care. Therefore, the existence of a correct, accurate and up to date database is an absolute necessity. Accurate identification of the cause in death certificates can make an invaluable contribution to the development of such...

متن کامل

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Adaptive Approximate Record Matching

Typographical data entry errors and incomplete documents, produce imperfect records in real world databases. These errors generate distinct records which belong to the same entity. The aim of Approximate Record Matching is to find multiple records which belong to an entity. In this paper, an algorithm for Approximate Record Matching is proposed that can be adapted automatically with input error...

متن کامل

Extraction of the Longitudinal Movement of the Carotid Artery Wall using Consecutive Ultrasonic Images: a Block Matching Algorithm

Introduction: In this study, a computer analysis method based on a block matching algorithm is presented to extract the longitudinal movement of the carotid artery wall using consecutive ultrasonic images. A window (block) is selected as the reference block in the first frame and the most similar block to the reference one is found in the subsequent frames. Material and Methods: The program was...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2012